Auditory Scene Analysis
   HOME

TheInfoList



OR:

In
perception Perception () is the organization, identification, and interpretation of sensory information in order to represent and understand the presented information or environment. All perception involves signals that go through the nervous system ...
and psychophysics, auditory scene analysis (ASA) is a proposed model for the basis of auditory perception. This is understood as the process by which the human auditory system organizes sound into perceptually meaningful elements. The term was coined by psychologist
Albert Bregman Albert Stanley "Al" Bregman (born September 15, 1936) is a Canadian professor and researcher in experimental psychology, cognitive science, and Gestalt psychology, primarily in the perceptual organization of sound. He is known for having defin ...
. The related concept in
machine perception Machine perception is the capability of a computer system to interpret data in a manner that is similar to the way humans use their senses to relate to the world around them. The basic method that the computers take in and respond to their enviro ...
is computational auditory scene analysis (CASA), which is closely related to source separation and
blind signal separation Blind may refer to: * The state of blindness, being unable to see * A window blind, a covering for a window Blind may also refer to: Arts, entertainment, and media Films * ''Blind'' (2007 film), a Dutch drama by Tamar van den Dop * ''Blind' ...
. The three key aspects of Bregman's ASA model are: segmentation, integration, and segregation.


Background

Sound reaches the ear and the eardrum vibrates as a whole. This signal has to be analyzed (in some way). Bregman's ASA model proposes that sounds will either be heard as "integrated" (heard as a whole – much like harmony in music), or "segregated" into individual components (which leads to counterpoint). For example, a bell can be heard as a 'single' sound (integrated), or some people are able to hear the individual components – they are able to segregate the sound. This can be done with chords where it can be heard as a 'color', or as the individual notes. Natural
sound In physics, sound is a vibration that propagates as an acoustic wave, through a transmission medium such as a gas, liquid or solid. In human physiology and psychology, sound is the ''reception'' of such waves and their ''perception'' b ...
s, such as the human voice,
musical instruments A musical instrument is a device created or adapted to make musical sounds. In principle, any object that produces sound can be considered a musical instrument—it is through purpose that the object becomes a musical instrument. A person who pl ...
, or cars passing in the street, are made up of many frequencies, which contribute to the perceived quality (like timbre) of the sounds. When two or more natural sounds occur at once, all the components of the simultaneously active sounds are received at the same time, or overlapped in time, by the ears of listeners. This presents their auditory systems with a problem: which parts of the sound should be grouped together and treated as parts of the same source or object? Grouping them incorrectly can cause the listener to hear non-existent sounds built from the wrong combinations of the original components. In many circumstances the segregated elements can be linked together in time, producing an auditory stream. This ability of auditory streaming can be demonstrated by the so-called
cocktail party effect The cocktail party effect is the phenomenon of the brain's ability to focus one's auditory attention on a particular stimulus while filtering out a range of other stimuli, such as when a partygoer can focus on a single conversation in a noisy roo ...
. Up to a point, with a number of voices speaking at the same time or with background sounds, one is able to follow a particular voice even though other voices and background sounds are present. In this example, the ear is segregating this voice from other sounds (which are integrated), and the mind "streams" these segregated sounds into an auditory stream. This is a skill which is highly developed by musicians, notably conductors who are able to listen to one, two, three or more instruments at the same time (segregating them), and following each as an independent line through auditory streaming.


Grouping and streams

A number of grouping principles appear to underlie ASA, many of which are related to principles of perceptual organization discovered by the school of
Gestalt psychology Gestalt-psychology, gestaltism, or configurationism is a school of psychology that emerged in the early twentieth century in Austria and Germany as a theory of perception that was a rejection of basic principles of Wilhelm Wundt's and Edward ...
. These can be broadly categorized into sequential grouping mechanisms (those that operate across time) and simultaneous grouping mechanisms (those that operate across frequency): * Errors in simultaneous grouping can lead to the blending of sounds that should be heard as separate, the blended sounds having different perceived qualities (such as pitch or timbre) to any of the sounds actually received. For instance two vowels presented simultaneously may not be identifiable if they are segregated. * Errors in sequential grouping can lead, for example, to hearing a word created out of syllables originating from two different voices. Segregation can be based primarily on perceptual cues or rely on the recognition of learned patterns ("schema-based"). The job of ASA is to group incoming sensory information to form an accurate mental representation of the individual sounds. When sounds are grouped by the auditory system into a perceived sequence, distinct from other co-occurring sequences, each of these perceived sequences is called an "auditory stream". In the real world, if the ASA is successful, a stream corresponds to a distinct environmental sound source producing a pattern that persists over time, such as a person talking, a piano playing, or a dog barking. However, in the lab, by manipulating the acoustic parameters of the sounds, it is possible to induce the perception of one or more auditory streams. One example of this is the phenomenon of streaming, also called "stream segregation." If two sounds, A and B, are rapidly alternated in time, after a few seconds the perception may seem to "split" so that the listener hears two rather than one stream of sound, each stream corresponding to the repetitions of one of the two sounds, for example, A-A-A-A-, etc. accompanied by B-B-B-B-, etc. The tendency towards segregation into separate streams is favored by differences in the acoustical properties of sounds A and B. Among the differences classically shown to promote segregation are those of frequency (for
pure tone Pure may refer to: Computing * A pure function * A pure virtual function * PureSystems, a family of computer systems introduced by IBM in 2012 * Pure Software, a company founded in 1991 by Reed Hastings to support the Purify tool * Pure-FTPd, ...
s), fundamental frequency (for complex tones), frequency composition, source location. But it has been suggested that about any systematic perceptual difference between two sequences can elicit streaming, provided the speed of the sequence is sufficient. An interactive web page illustrating this streaming and the importance of frequency separation and spee
can be found here.
Andranik Tangian Andranik Semovich Tangian (Melik-Tangyan) (Russian: Андраник Семович Тангян (Мелик-Тангян)); born March 29, 1952) is a Soviet Armenian-German mathematician, political economist and music theorist. Tangian is known ...
argues that the grouping phenomenon is observed not only in dynamics but in statics as well. For instance, the sensation of a chord is the effect of acoustical data representation rather than physical causality (indeed, a single physical body, like a loudspeaker membrane, can produce an effect of several tones, and several physical bodies, like organ pipes tuned as a chord, can produce an effect of a single tone). From the viewpoint of
musical acoustics Musical acoustics or music acoustics is a multidisciplinary field that combines knowledge from physics, psychophysics, organology (classification of the instruments), physiology, music theory, ethnomusicology, signal processing and instrument build ...
, a chord is a special kind of sound whose
spectrum A spectrum (plural ''spectra'' or ''spectrums'') is a condition that is not limited to a specific set of values but can vary, without gaps, across a continuum. The word was first used scientifically in optics to describe the rainbow of colors ...
— the set of partial tones (sinusoidal oscillations) — can be regarded as generated by displacements of a single tone spectrum along the frequency axis. In other words, the chord’s interval structure is an acoustical contour drawn by a tone (in dynamics, polyphonic voices are trajectories of tone spectra). This is justified by the information theory. If the generative tone is harmonic (= has a pitch salience) then such a representation is proved to be unique and requires the least amount of memory, i.e. is the least complex in the sense of
Kolmogorov Andrey Nikolaevich Kolmogorov ( rus, Андре́й Никола́евич Колмого́ров, p=ɐnˈdrʲej nʲɪkɐˈlajɪvʲɪtɕ kəlmɐˈɡorəf, a=Ru-Andrey Nikolaevich Kolmogorov.ogg, 25 April 1903 – 20 October 1987) was a Sovi ...
. Since it is simpler all other representations, including the one where the chord is regarded as a single complex sound, the chord is perceived as a compound. If the generative tone is inharmonic, like a bell-like sound, the interval structure is still recognizable as displacements of a tone spectrum, whose pitch can be even undetectable. This optimal representation-based definition of a chord explains, among other things, the predominance of interval hearing over the absolute pitch hearing.


Experimental basis

Many experiments have studied the segregation of more complex patterns of sound, such as a sequence of high notes of different pitches, interleaved with low ones. In such sequences, the segregation of co-occurring sounds into distinct streams has a profound effect on the way they are heard. Perception of a melody is formed more easily if all its notes fall in the same auditory stream. We tend to hear the rhythms among notes that are in the same stream, excluding those that are in other streams. Judgments of timing are more precise between notes in the same stream than between notes in separate streams. Even perceived spatial location and perceived loudness can be affected by sequential grouping. While the initial research on this topic was done on human adults, recent studies have shown that some ASA capabilities are present in newborn infants, showing that they are built-in, rather than learned through experience. Other research has shown that non-human animals also display ASA. Currently, scientists are studying the activity of neurons in the auditory regions of the cerebral cortex to discover the mechanisms underlying ASA.


See also

*
Illusory discontinuity Illusory discontinuity is an auditory illusion in which a continuous ongoing sound becomes inaudible during a brief, non-masking noise. The illusion is perceived only by some listeners, but not by others, reflecting individual variation in he ...
* Phonemic restoration effect *
Theory of indispensable attributes A theory is a rational type of abstract thinking about a phenomenon, or the results of such thinking. The process of contemplative and rational thinking is often associated with such processes as observational study or research. Theories may be s ...


References

{{reflist Psychoacoustics Sound